几种数据增强：Mixup,Cutout,CutMix 和yolov4中的 Mosaic

2023-11-09 15:21| 来源: 网络整理| 查看: 265

1.几种数据增强的比较

2.What does model learn with CutMix?

3. 查看CutMix代码

4. 查看CutOut代码

5.Mosaic数据增强方法

论文地址：https://arxiv.org/abs/1905.04899v2

1.几种数据增强的比较 Mixup:将随机的两张样本按比例混合，分类的结果按比例分配；Cutout:随机的将样本中的部分区域cut掉，并且填充0像素值，分类的结果不变；

CutMix:就是将一部分区域cut掉但不填充0像素而是随机填充训练集中的其他数据的区域像素值，分类结果按一定的比例分配

区别上述三种数据增强的区别：cutout和cutmix就是填充区域像素值的区别；mixup和cutmix是混合两种样本方式上的区别：mixup是将两张图按比例进行插值来混合样本，cutmix是采用cut部分区域再补丁的形式去混合图像，不会有图像混合后不自然的情形。优点（1）在训练过程中不会出现非信息像素，从而能够提高训练效率；（2）保留了regional dropout的优势，能够关注目标的non-discriminative parts；（3）通过要求模型从局部视图识别对象，对cut区域中添加其他样本的信息，能够进一步增强模型的定位能力；（4）不会有图像混合后不自然的情形，能够提升模型分类的表现；（5）训练和推理代价保持不变。

2.What does model learn with CutMix?

作者通过热力图，给出了结果。CutMix的操作使得模型能够从一幅图像上的局部视图上识别出两个目标，提高训练的效率。由图可以看出，Cutout能够使得模型专注于目标较难区分的区域（腹部），但是有一部分区域是没有任何信息的，会影响训练效率；Mixup的话会充分利用所有的像素信息，但是会引入一些非常不自然的伪像素信息。

3. 查看CutMix代码

代码地址：https://github.com/clovaai/CutMix-PyTorch 生成裁剪区域

"""输入为：样本的size和生成的随机lamda值""" def rand_bbox(size, lam): W = size[2] H = size[3] """1.论文里的公式2，求出B的rw,rh""" cut_rat = np.sqrt(1. - lam) cut_w = np.int(W * cut_rat) cut_h = np.int(H * cut_rat) # uniform """2.论文里的公式2，求出B的rx,ry（bbox的中心点）""" cx = np.random.randint(W) cy = np.random.randint(H) #限制坐标区域不超过样本大小 bbx1 = np.clip(cx - cut_w // 2, 0, W) bby1 = np.clip(cy - cut_h // 2, 0, H) bbx2 = np.clip(cx + cut_w // 2, 0, W) bby2 = np.clip(cy + cut_h // 2, 0, H) """3.返回剪裁B区域的坐标值""" return bbx1, bby1, bbx2, bby2

整体流程

"""train.py 220-244行""" for i, (input, target) in enumerate(train_loader): # measure data loading time data_time.update(time.time() - end) input = input.cuda() target = target.cuda() r = np.random.rand(1) if args.beta > 0 and r < args.cutmix_prob: # generate mixed sample """1.设定lamda的值，服从beta分布""" lam = np.random.beta(args.beta, args.beta) """2.找到两个随机样本""" rand_index = torch.randperm(input.size()[0]).cuda() target_a = target#一个batch target_b = target[rand_index] #batch中的某一张 """3.生成剪裁区域B""" bbx1, bby1, bbx2, bby2 = rand_bbox(input.size(), lam) """4.将原有的样本A中的B区域，替换成样本B中的B区域""" input[:, :, bbx1:bbx2, bby1:bby2] = input[rand_index, :, bbx1:bbx2, bby1:bby2] # adjust lambda to exactly match pixel ratio """5.根据剪裁区域坐标框的值调整lam的值""" lam = 1 - ((bbx2 - bbx1) * (bby2 - bby1) / (input.size()[-1] * input.size()[-2])) # compute output """6.将生成的新的训练样本丢到模型中进行训练""" output = model(input) """7.按lamda值分配权重""" loss = criterion(output, target_a) * lam + criterion(output, target_b) * (1. - lam) else: # compute output output = model(input) loss = criterion(output, target) 4. 查看CutOut代码 import torch import numpy as np class Cutout(object): """Randomly mask out one or more patches from an image. Args: n_holes (int): Number of patches to cut out of each image. length (int): The length (in pixels) of each square patch. """ def __init__(self, n_holes, length): self.n_holes = n_holes self.length = length def __call__(self, img): """ Args: img (Tensor): Tensor image of size (C, H, W). Returns: Tensor: Image with n_holes of dimension length x length cut out of it. """ h = img.size(1) w = img.size(2) mask = np.ones((h, w), np.float32) for n in range(self.n_holes): y = np.random.randint(h) x = np.random.randint(w) y1 = np.clip(y - self.length // 2, 0, h) y2 = np.clip(y + self.length // 2, 0, h) x1 = np.clip(x - self.length // 2, 0, w) x2 = np.clip(x + self.length // 2, 0, w) mask[y1: y2, x1: x2] = 0. mask = torch.from_numpy(mask) mask = mask.expand_as(img) img = img * mask return img 5.Mosaic数据增强方法

Yolov4的mosaic数据增强参考了CutMix数据增强方式，理论上具有一定的相似性。CutMix数据增强方式利用两张图片进行拼接，但是mosaic利用了四张图片，根据论文所说其拥有一个巨大的优点是丰富检测物体的背景，且在BN计算的时候一下子会计算四张图片的数据。

实现思路

1.每次读取四张图片

2.分别对四张图片进行翻转、缩放、色域变化等，并且按照四个方向位置摆好。

3.进行图片的组合和框的组合

全部代码

from PIL import Image, ImageDraw import numpy as np from matplotlib.colors import rgb_to_hsv, hsv_to_rgb import math def rand(a=0, b=1): return np.random.rand()*(b-a) + a def merge_bboxes(bboxes, cutx, cuty): merge_bbox = [] for i in range(len(bboxes)): for box in bboxes[i]: tmp_box = [] x1,y1,x2,y2 = box[0], box[1], box[2], box[3] if i == 0: if y1 > cuty or x1 > cutx: continue if y2 >= cuty and y1 = cutx and x1 cutx: continue if y2 >= cuty and y1 = cutx and x1 = cuty and y1 = cutx and x1 cuty or x2 < cutx: continue if y2 >= cuty and y1 = cutx and x1 1, box_h>1)] box_data = np.zeros((len(box),5)) box_data[:len(box)] = box image_datas.append(image_data) box_datas.append(box_data) img = Image.fromarray((image_data*255).astype(np.uint8)) for j in range(len(box_data)): thickness = 3 left, top, right, bottom = box_data[j][0:4] draw = ImageDraw.Draw(img) for i in range(thickness): draw.rectangle([left + i, top + i, right - i, bottom - i],outline=(255,255,255)) img.show() # 将图片分割，放在一起 cutx = np.random.randint(int(w*min_offset_x), int(w*(1 - min_offset_x))) cuty = np.random.randint(int(h*min_offset_y), int(h*(1 - min_offset_y))) new_image = np.zeros([h,w,3]) new_image[:cuty, :cutx, :] = image_datas[0][:cuty, :cutx, :] new_image[cuty:, :cutx, :] = image_datas[1][cuty:, :cutx, :] new_image[cuty:, cutx:, :] = image_datas[2][cuty:, cutx:, :] new_image[:cuty, cutx:, :] = image_datas[3][:cuty, cutx:, :] # 对框进行进一步的处理 new_boxes = merge_bboxes(box_datas, cutx, cuty) return new_image, new_boxes def normal_(annotation_line, input_shape): '''random preprocessing for real-time data augmentation''' line = annotation_line.split() image = Image.open(line[0]) box = np.array([np.array(list(map(int,box.split(',')))) for box in line[1:]]) iw, ih = image.size image = image.transpose(Image.FLIP_LEFT_RIGHT) box[:, [0,2]] = iw - box[:, [2,0]] return image, box if __name__ == "__main__": with open("2007_train.txt") as f: lines = f.readlines() a = np.random.randint(0,len(lines)) # index = 0 # line_all = lines[a:a+4] # for line in line_all: # image_data, box_data = normal_(line,[416,416]) # img = image_data # for j in range(len(box_data)): # thickness = 3 # left, top, right, bottom = box_data[j][0:4] # draw = ImageDraw.Draw(img) # for i in range(thickness): # draw.rectangle([left + i, top + i, right - i, bottom - i],outline=(255,255,255)) # img.show() # # img.save(str(index)+"box.jpg") # index = index+1 line = lines[a:a+4] image_data, box_data = get_random_data(line,[416,416]) img = Image.fromarray((image_data*255).astype(np.uint8)) for j in range(len(box_data)): thickness = 3 left, top, right, bottom = box_data[j][0:4] draw = ImageDraw.Draw(img) for i in range(thickness): draw.rectangle([left + i, top + i, right - i, bottom - i],outline=(255,255,255)) img.show() # img.save("box_all.jpg")

参考：数据增强：Mixup,Cutout,CutMix | Mosaic - 简书

【如果对您有帮助，交个朋友给个一键三连吧，您的肯定是我博客高质量维护的动力！！！】

【本文地址】

公司简介

联系我们